1 TCGCGGGAGC CAGAGGGCCC TGCGGTCCTC GGTGGTCTTG CCAGCCCCTC 
51 CTCATCCCAG GGCCCTCCGC GCCTGTGAGG ACTCCCTCAG GTCGGCCACG 
101 GGACCTGACG CAACAGGATG GACGAGTCCC CTGAGCCTCT GCAGCAGGGC 
151 AGAGGGCCGG TGCCGGTCCG ACGGCAGCGC CCAGCACCCC GGGGTCTGCG 
201 TGAGATGCTG AAGGCCAGGC TGTGGTGCAG CTGCTCGTGC AGTGTGCTGT 
251 GCGTCCGGGC GCTGGTGCAG GACCTGCTCC CCGCCACGCG CTGGCTGCGT 
301 CAGTACCGCC CGCGGGAGTA CCTGGCAGGC GACGTCATGT CTGGGCTGGT 
351 CATCGGCATC ATCCTGGTGC CGCAGGCCAT CGCCTACTCA TTGCTGGCCG 
4 01 GGCTGCAGCC CATCTACAGC CTCTATACGT CCTTCTTCGC CAACCTCATC 
451 TACTTCCTCA TGGGCACCTC ACGGCATGTC TCCGTGGGCA TCTTCAGCCT 
501 GCTTTGCCTC ATGGTGGGGC AGGTGGTGGA CCGGGAGCTC CAGCTGGCCG 
551 GCTTTGACCC CTCCCAGGAC GGCCTGCAGC CCGGAGCCAA CAGCAGCACC 
601 CTCAACGGCT CGGCTGCCAT GCTGGACTGC GGGCGTGACT GCTACGCCAT 
651 CCGTGTCGCC ACCGCCCTCA CGCTGATGAC CGGGCTTTAC CAGGTCCTCA 
701 TGGGCGTCCT CCGGCTGGGC TTCGTGTCCG CCTACCTCTC ACAGCCACTG 
7 51 CTCGATGGCT TTGCCATGGG GGCCTCCGTG ACCATCCTGA CCTCGCAGCT 
801 CAAACACCTG CTGGGCGTGC GGATCCCGCG GCACCAGGGG CCCGGCATGG 
851 TGGTCCTCAC ATGGCTGAGC CTGCTGCGCG GCGCCGGGCA GGCCAACGTG 
901 TGCGACGTGG TCACCAGCAC GGTGTGCCTG GCGGTGCTGC TAGCCGCGAA 
951 GGAGCTCTCA GACCGCTACC GACACCGCCT GAGGGTGCCG CTGCCCACGG 
1001 AGCTGCTGGT CATCGTGGTG GCCACACTCG TGTCGCACTT CGGGCAGCTC 
1051 CACAAGCGCT TTGGCTCGAG CGTGGCTGGC GACATCCCCA CGGGTTTCAT 
1101 GCCCCCTCAG GTCCCAGAGC CCAGGCTGAT GCAGCGTGTG GCTTTGGATG 
1151 CCGTGGCCCT GGCCCTCGTG GCTGCCGCCT TCTCCATCTC GCTGGCGGAG 
1201 ATGTTCGCCC GCAGTCACGG CTACTCTGTG CGTGCCAACC AGGAGCTGCT 
1251 GGCTGTGGGC TGCTGCAACG TGCTACCCGC CTTCCTCCAC TGCTTCGCCA 
1301 CCAGCGCCGC CCTGGCCAAG AGCCTGGTGA AGACAGCCAC TGGCTGCCGG 
1351 ACACAGCTGT CCAGCGTGGT CAGCGCCACC GTGGTGCTGC TGGTGCTGCT 
14 01 GGCGCTGGCA CCGCTGTTCC ACGACCTACA GCGAAGCGTG CTGGCCTGCG 
14 51 TCATCGTGGT CAGCCTGCGG GGGGCCCTGC GCAAGGTGTG GGACCTCCCG 
1501 CGGCTGTGGC GGATGAGCCC GGCTGACGCG CTGGTCTGGG CAGGCACCGC 
1551 GGCCACCTGT ATGCTGGTCA GCACAGAGGC CGGGCTGCTG GCTGGCGTCA 
1601 TCCTCTCGCT GCTCAGCCTG GCCGGCCGCA CCCAACGCCC ACGCACCGCC 
1651 CTGCTGGCCC GCATCGGGGA CACGGCCTTC TACGAGGATG CCACAGAGTT 
17 01 CGAGGGCCTC GTCCCTGAGC CCGGCGTGCG GGTGTTCCGC TTTGGGGGGC 
1751 CGCTGTACTA TGCCAACAAG GACTTCTTCC TGCAGTCACT CTACAGCCTC 
1801 ACGGGGCTGG ACGCAGGGTG CATGGCTGCC AGGAGGAAGG AGGGGGGCTC 
1851 AGAGACGGGG GTCGGTGAGG GAGGCCCTGC CCAGGGCGAG GACCTGGGCC 
1901 CGGTTAGCAC CAGGGCTGCG CTGGTGCCCG CAGCGGCCGG CTTCCACACA 
1951 GTGGTCATCG ACTGCGCCCC GCTGCTGTTC CTAGACGCAG CCGGTGTGAG 
2001 CACGCTGCAG GACCTGCGCC GAGACTACGG GGCCCTGGGC ATCAGCCTGC 
2051 TGCTAGCCTG CTGCAGCCCG CCTGTGAGAG ACATTCTGAG CAGAGGAGGC 
2101 TTCCTCGGGG AGGGCCCCGG GGACACGGCT GAG GAG GAG C AGCTGTTCCT 
2151 CAGTGTGCAC GATGCCGTGC AGACAGCACG AGCCCGCCAC AGGGAGCTGG 
2201 AGGCCACCGA TGTCCATCTG TAGCAGGGCC AGGCCTGCCC AGCAGCCTCT 
2251 GCTCCCTCCT GGGGACCCAC AGCAGACGTC TGCAAGCCAC TGCTGAGACC 
2301 CTTCCCAGGG AGGAGCCACC CAAGAGCTGC ACTCTTGTGC CACAGCTGCC 
2351 CTGGGGAAAC CGGGGAACCC CAACTGGGAA AGGAGGCCCT CTGATCACAC 
2 4 01 GCAGGACCCA AACACTCAGA AAT CAAGAAC CTCTGCCTCC GAGACAGGCT 
2451 GGCCCACAGT GCTGGCTGGG CCCCAATGCA CCGTCCCTCA GCTCAGAAGG 
2501 GATGGGCCTG ACCTGACGCT CAGGGTTGAC ATCTTATTTG AACAAGGGTC 
2 551 CCCCGCCATC ATGCAGCCTC CAAGGTGCCA AGAGGACTCC CTATGCCCAG 
2601 GCCTGCCCGG TGCCCACCCT GCTGGTAGGA GCCAGCGGCT CTGGCCAAGT 
2651 GCACGAGGGT CTCTGTGTTT CCAGAAGGCC CCACACACCC AAGTGCCCCT 
2701 CACACCTCGT GCCTCCCCCT CACAGGGTGG CCACCTGCAC CAGCGTCAGG 
2751 GCCCAGGGTG CTGTGACCGA TGAGACCTCA GCTCAGCCCT CAGGTGCAGT 
2801 GGCCCTACCC AGCCTGGCCA GCAGACACAC ACAGGGATGC TCACGGGTGC 
2851 ACCAGGAGCC AGGTGCGGCG CAGCCAACCC TGAGCCTGCA GGGAGACCTG 
2 901 CAGGAAGCCC ACCGTGCCCC ATGCAGGGGC TCCCTCCAGC ACACAGCCCT 
2951 CACCCCAGCA CAGCCAGCAA GGACACGCTC TCCCCAACAG GGTGCTTCGG 
3001 CGGGAGGTGG GGGAACAAGG GGTCTTCCGA GCAGCCCCCA GCCCTCCCCT 
3051 CCCATCTGTG CCTCTGTAAG GGGCTCTGGG ACGCCCAGAC CCTGCCCGCC 
3101 GCCCACCTGG TGGTGACAAG CTCCAGCAGC CAGTGGGTCC GGACCTGCTT 
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3151 GATGCCGCGG TGAGGGACGG CGCCCACATA GGCGAGGTTG AGCTGCTGGT 

3201 CCCAGCTGAG GACGTACTGG TCAGCCTGGC TGTGTGGCAG CGGGGGGCTG 

3251 GGGACAACAA AGGGGCGGCT CAGTCCCGAG CCTCAGCATG GCTGGCAGCG 

3301 CGGCTGACAC ACACGTTCAA GCCCAGGACT GCCCGGGCGC AGGATCCAGG 

3351 CGCTGCCCGT GCGTTCAGTG ACTAATAAAA TGACCCTTAG GGCCAGGAAA 

3401 AAAAAAAAAA AAAAAAAAAA AAAAAAAAAA AAAAA ( SEQ ID NO : 1 ) 



FEATURES : 

5 T UTR : 1-117 

Start Codon: 118 

Stop Codon: 2221 

3 'UTR: 2224 
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HOMOLOGOUS PROTEINS: 

Top BLAST Hits: 
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BLAST dbEST Hits: 
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EXPRESSION INFORMATION FOR MODULATORY USE: 

library source (from BLAST dbEST hits) : 

gi 1 10209038 Lung 
^ gi 1714 0527 Lymph 

O: gi 158 4 7 932 Kidney 

w 

™i Tissue Screening Panels: 

^ Human heart 

ff Human Leukocyte 

ii Thyroid 

U Pituatary 

=j Brain 

s== " Fetal brain 

Adrenal gland 
3 : Testis 
;| Kidney 
^ Small intestine 

= s Pancreas 
J Liver 
=J- Lung 

Placenta 

Skeletal muscle 

Spleen 

Hela cells 
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1 MDESPEPLQQ GRGPVPVRRQ RPAPRGLREM LKARLWCSCS CSVLCVRALV 
51 QDLLPATRWL RQYRPREYLA GDVMSGLVIG IILVPQAIAY SLLAGLQPIY 
101 SLYTSFFANL IYFLMGTSRH VSVGIFSLLC LMVGQWDRE LQLAGFDPSQ 
151 DGLQPGANSS TLNGSAAMLD CGRDCYAIRV ATALTLMTGL YQVLMGVLRL 
201 GFVSAYLSQP LLDGFAMGAS VTILTSQLKH LLGVRIPRHQ GPGMVVLTWL 
251 SLLRGAGQAN VCDWTSTVC LAVLLAAKEL SDRYRHRLRV PLPTELLVIV 
301 VATLVSHFGQ LHKRFGSSVA GDIPTGFMPP QVPEPRLMQR VALDAVALAL 
351 VAAAFSISLA EMFARSHGYS VRANQELLAV GCCNVLPAFL HCFATSAALA 
401 KSLVKTATGC RTQLSSVVSA TWLLVLLAL APLFHDLQRS VLACVIVVSL 
451 RGALRKVWDL PRLWRMS PAD ALVWAGTAAT CMLVSTEAGL LAGVILSLLS 
501 LAGRTQRPRT ALLARIGDTA FYEDATEFEG LVPEPGVRVF RFGGPLYYAN 
551 KDFFLQSLYS LTGLDAGCMA ARRKEGGSET GVGEGGPAQG EDLGPVSTRA 
601 ALVPAAAGFH TVVIDCAPLL FLDAAGVSTL QDLRRDYGAL GISLLLACCS 
651 PPVRDILSRG GFLGEGPGDT AEEEQLFLSV HDAVQTARAR HRELEATDVH 
701 L (SEQ ID NO:2) 



FEATURES : 

Functional domains and key regions: 

[1] PDOC00001 PS00001 ASN_GLYCOSYLATION 
N-glycosylation site 



Number of matches: 2 

1 158-161 NSST 

2 163-166 NGSA 



[2] PDOC00005 PS00005 PKC_PHOSPHO_SITE 
Protein kinase C phosphorylation site 



Number of matches: 7 

1 117-119 TSR 

2 281-283 SDR 

3 370-372 SVR 

4 449-451 SLR 

5 505-507 TQR 

6 597-599 STR 

7 686-688 TAR 



[3] PDOC00006 PS00006 CK2_PHOSPHO_SITE 
Casein kinase II phosphorylation site 



Number of matches : 7 

1 358-361 SLAE 

2 467-470 SPAD 

3 526-529 TEFE 

4 562-565 TGLD 

5 629-632 TLQD 

6 670-673 TAEE 

7 679-682 SVHD 



[4] PDOC00007 PS00007 TYR_PHOSPHO_SITE 
Tyrosine kinase phosphorylation site 



515-522 RIGDTAFY 



[5] PDOC00008 PS00008 MYRISTYL 
N-myristoylation site 

Number of matches: 15 

1 76-81 GLVIGI 

2 152-157 GLQPGA 

3 156-161 GANSST 

4 218-223 GASVTI 

5 255-260 GAGQAN 
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316- 


■321 


GSSVAG 


7 


4 7 6- 


■481 


GTAATC 


8 


489- 


■4 94 


GLLAGV 


9 


493- 


■498 


GVILSL 


10 


563- 


■568 


GLDAGC 


11 


567- 


■572 


GCMAAR 


12 


576- 


■581 


GGSETG 


13 


577- 


•582 


GSETGV 


14 


581- 


■586 


GVGEGG 


15 


660- 


■665 


GGFLGE 



[6] PDOC00012 PS00012 PROS PHO PANTETHEINE 
Phosphopantetheine attachment site 

411-426 RTQLSSVVSATVVLLV 

[7] PDOC00870 PS01130 SULFATE_TRANSP 
Sulfate transporters signature 

98-119 PIYSLYTSFFANLIYFLMGTSR 

Membr-ane spanning structure and domains : 



Helix 


Begin 


End 


Score 


Certainty 


1 


73 


93 


1. 


.663 


Certain 


2 


98 


118 


1. 


.558 


Certain 


3 


121 


141 


0. 


.813 


Putative 


4 


180 


200 


1. 


.400 


Certain 


5 


209 


229 


1. 


.017 


Certain 


6 


259 


279 


1. 


.008 


Certain 


7 


291 


311 


1. 


.227 


Certain 


8 


344 


364 


1. 


.585 


Certain 


9 


377 


397 


1. 


.343 


Certain 


10 


414 


434 


2, 


.107 


Certain 


11 


483 


503 


1, 


.446 


Certain 


12 


602 


622 


0, 


. 977 


Putative 


13 


635 


655 


0, 


.897 


Putative 
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BLAST Alignment to Top Hit: 

>CRA1335001098671800 /altid-gi 1 1154 574 1 /def-ref | NP_071325 . 1 | solute 
carrier family 2 6 (sulfate transporter) r member 1 [Homo 
sapiens] /org^Homo sapiens /taxon=9606 /dataset=nraa 
/length-701 
Length =701 

Score = 1385 bits (3545), Expect =0.0 

Identities = 698/701 (99%), Positives = 698/701 (99%) 

Frame = +1 

Query : 1 MDE S PE PLQQGRGPVPVRRQRPAPRGLREMLKARLWC S C S C S VLCVRALVQDLL PATRWL 180 

MDESPEPLQQGRGPVPVRRQRPAPRGLREMLKARLWCSCSCSVLCVRALVQDLLPATRWL 
Sbjct: 1 MDE SPEPLQQGRGPVPVRRQRPAPRGLREMLKARLWCSCSCS VLCVRALVQDLL PATRWL 60 

Query: 181 RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 360 

RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 
Sbjct: 61 RQYRPREYLAGDVMSGLVIGIILVPQAIAYSLLAGLQPIYSLYTSFFANLIYFLMGTSRH 120 

Query: 361 VSVGIFSLLCLMVGQVVDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 540 

VSVGIFSLLCLMVGQVVDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 
Sbjct : 121 VSVGIFSLLCLMVGQWDRELQLAGFDPSQDGLQPGANSSTLNGSAAMLDCGRDCYAIRV 180 

Query: 541 ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVTILTSQLKHLLGVRIPRHQ 7 20 

ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVTILTSQLKHLLGVRIPRHQ 
Sbjct: 181 ATALTLMTGLYQVLMGVLRLGFVSAYLSQPLLDGFAMGASVTILTSQLKHLLGVRIPRHQ 240 

Query: 721 GPGMVVLTWLSLLRGAGQANVCDWTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 900 

GPGMVVLTWLSLLRGAGQANVCDVVTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 
Sbjct: 241 GPGMWLTWLSLLRGAGQANVCDVVTSTVCLAVLLAAKELSDRYRHRLRVPLPTELLVIV 300 

Query: 901 VATLVSHFGQLHKRFGSSVAGDIPTGFMPPQVPEPRLMQRVALDAVALALVAAAFSISLA 1080 

VATLVSHFGQLHKRFGS SVAGDI PTGFMPPQVPEPRLMQRVALDAVALALVAAAFS I SLA 
Sbjct: 301 VATLVSHFGQLHKRFGS SVAGD I PTGFMPPQVPEPRLMQRVALDAVALALVAAAFS I SLA 360 

Query: 1081 EMFARS HGY SVRANQELLAVGCCNVL P AFLHCFAT S AALAKS LVKT ATGCRTQL S SWSA 1260 

EMFARSHGYSVRANQELLAVGCCNVLPAFLHCFATSAALAKSLVKTATGCRTQLSSWSA 
Sbjct: 361 EMFARS HGY SVRANQELLAVGCCNVL PAFLHCFAT SAALAKSLVKTATGCRTQLSSWSA 420 

Query: 12 61 TVVLLVLLALAPLFHDLQRSVLACVIVVSLRGALRKVWDLPRLWRMSPADALVWAGTAAT 1440 

TVVLLVLLALAPLFHDLQRSVLACVIVVSLRGALRKVW PRLWRMS PADALVW AGTAAT 
Sb j c t : 421 TVVLLVLLALAPLFHDLQRSVLACVIVVSLRGALRKVWGFPRLWRMS PADALVWAGTAAT 480 

Query: 1441 CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 1620 

CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 
Sbjct: 481 CMLVSTEAGLLAGVILSLLSLAGRTQRPRTALLARIGDTAFYEDATEFEGLVPEPGVRVF 540 

Query: 1621 RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 1800 

RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 
Sbjct: 541 RFGGPLYYANKDFFLQSLYSLTGLDAGCMAARRKEGGSETGVGEGGPAQGEDLGPVSTRA 600 

Query: 1801 ALVPAAAGFHTVVIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 1980 

ALVPAAAGFHTWIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 
Sbjct : 601 ALVPAAAGFHTWIDCAPLLFLDAAGVSTLQDLRRDYGALGISLLLACCSPPVRDILSRG 660 

Query: 1981 GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATDVHL 2103 

GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATD HL 
Sbjct: 661 GFLGEGPGDTAEEEQLFLSVHDAVQTARARHRELEATDAHL 701 (SEQ ID NO: 4) 
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Hmmer search results (Pfam) : 

Model Description 



Score E-value N 



PF00916 Sulfate transporter family 405.6 4.7e-118 

CE00008 E00008 GUANYLIN 8.6 0.016 

PF00497 Bacterial extracellular solute-binding prote 4.4 0.57 



1 
1 

1 



Parsed for domains: 

Model Domain seq-f seq-t hmm-f hmm-t score E-value 

PF00497 1/1 338 356 . . 1 27 [. 4.4 0.57 

CE00008 1/1 409 431 .. 1 24 [. 8.6 0.016 

PF00916 1/1 195 505 .. 1 328 [] 405.6 4.7e-118 
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1 NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN NNNNNNNNNN 
51 NNNNNNNNNN NNNTGGTGAA ACCCCGTCTC TACTAAAAAT ACAAAAAATT 
101 AGCCGGGCGT GGTGGCGGGT GCCTGTAGTC CCAGCTACTC GGGAGGCTGA 
151 GGCAGGAGAA TCACTTGAAC CCGGGAGACA GAGCTTGCAG TGAGCCGAGA 
201 TCATGCCACT GTACTCCAGC CTGGGCAACA GAGCGAAACT CCGTCTCAAA 
251 AAAAAAAAAA TTAGCCGGGC GCGGTGGCGG GCGCCTGTAG TCCCAGCTAC 
301 TCAGGAGGCT GAG GC AG GAG AATGGCGTGA ACCCAGGAGG CAGAGCTTCC 
351 AGTGAGCCGA GATCACACCA CTGCATTCCG GCCTGGGTGA CAGAGCAAGA 
401 CTCCGCCTCA AAAAAAAAAA AAGAAAAGGT GGGGGGCGTC TCACTATGTT 
451 GACCAGGCTG GTCTTGAACT GCTGGCCTTA AGCGATCCTC CTGTCTAGGC 
501 CTCCCAAAGT GTTGGAATTA CAGGAGTGAA CCATCGTGCC TGGCTAATAA 
551 TTCCTTTTAA AAAGCAGCTT ACCCTTATTT TCACGTGTGG GCCTAATTTA 
601 GTTCACTTAA AAAAATCATT TATCTTCACC CCAGCCCTAT GAGGCAGGCA 
651 CTGCCGGTCC TGGTCTGTGG TAG AG G G GAG GGCAGAGGAG CCGTGAGGGT 
701 GACCAGGCGC TGTGGGTCGG TGCTGGGTCC AGTCAGACCA GGACTCCTGG 
751 CCAGTCACGG CACCTTGACC CCGGCAGTCC TCGCCCTGGG CGGTGAGCAC 
801 CACACACAGG GCTTACGCGA GCACACACGC ATATGCACGC ACCGGCAGCC 
851 TTGGGCTGAG CCGGCTGTCA GCCTCTGCCC TGCTCCAGCT TGGACCAGGC 
901 TGGCTCCTTG CAGGACCAGG AGGGTGTCCG GCGACTGGAC ACGGAGACCA 
951 AGCCTCCCTC AGCCCCGCCT GGGTTTGAAG GCTGCTGCAC TCGACCCCAG 
1001 ACCCCAGAGC TGAAGGTTTA CCTGTGCTCA GCCCCTGAGC CCCCGCCTCC 
1051 CGCTGGTCCC TAAGCCCCCC CGGCAGGGCC GCAGAGCCAC AGCTGCAGCC 
1101 GCTCCTGGGA GGCTGGGAGC TCCTCAGAGG CCCACACAGC TCTAACTACT 
1151 ACAAGCCCCT GATTACAGTT CAACTCCCGG AT CAGCCGAT CAGGTAACAT 
1201 GGCTGGAGAA ACCCGTGACT CAGCAATCTG TAGGTAAATA ATTGAACTAC 
1251 AGAGTCCAGG GCACAGACCA CTGCCTGCAG GTTGGCGCCA CCACCCCCAC 
1301 TCTCCCCGCT GCTCGCGGGA GCCAGAGGGC CCTGCGGTCC TCGGTGGTCT 
1351 TGCCAGCCCC TCGTCATCCC AGGGCCCTCC GCGCCTGTGA GGACTCCCTC 
1401 AGGTAAGAAC CATCCTGGGC CCAGATCTCA GCTGCAGCAG AGGGGGGCGT 
1451 GGGAGCCGAG GCCAGAAATG CCCTGGACTC GTGGTTTCTT AGGGGCACCC 
1501 TCAGGCTCAA GGCAGGTGGC CCTACTGTCC CCATTCCACA CACCTGGACC 
1551 CCAGGGGCTT GGGGTGGGCT TCAGGGCATC CAGGGACCCA GTGTGGTGGG 
1601 GTCTTCCAGG GAAGGGGACA CAACTCTTGC AATGTTGCCT GAGGGCCAGG 
1651 ACCCCCGCTC TGTGCCCCAG GGGTGCTGTG CCCAGCCTGC ATGTGTCAAC 
1701 CTACCAGGCT GGGCTCACTG CCCCAACACA CCCGCCAGGA GACTGGAGCT 
1751 CGCACACCCT GGGCCAGCGT GCAAACAGCA GGCTCAGCCC AGGCTCCAGG 
1801 GTGTCCTGGG CACCTGGTGT CCTGGGAGCA AAGTCTTTGC CTAACGTCGC 
1851 TGAGAAGAAT GTTTAAAGTG AAAGTACATT GGAGTCTGCA AACAGGACAG 
1901 ACCCGAGGCC TCACGTGGGA CCAGTCAGGC CTCTAAGCAC CGCCTCCCTA 
1951 ACGCCACGGT GTTTTCCGAG AT CAAGGGAA AGGTCAGGTG CCCTTCCGGC 
2001 TCTGCCGGCC CAGGGTGACT GTGTGCAGCG GGCTGGGCCC TCTCGGTGCT 
2051 GCCTCGGGAC AGTGTGTCAT GGCCGTTCCA CAGTGAGCTG GTGCAGCCTG 
2101 GGAAAAAGGG CGCCTCACGT CCCAGAACTG TCTGGGCAGG GGAGACAGAC 
2151 GCCAGTCACC CTCCTCCCCT CCCAGCTGGC CCTGATGGGG CCCCCGTCCA 
2201 GGCATATTCT CAGAATTCTG TCCCAAGTCC AGGCGGATGG GCTAGGCTAG 
2251 TGTCTGAGTG CTGCTCCCCC AGCAGACTTG GGGTCCCAGT ACCCACAAAG 
2301 CTTGGCAGGG ACATAGGAGG CCTCTTCCTG AGACTTCCGC CAGCCCCAGG 
2351 ACCCACAGGG CAGGTGACAG AGGGGTGGGT GGAGGTGTCT CCAGGAGAGC 
24 01 AGGCGATGGT TTGGATGGGG GAGGGAGGGC TCTGGTGTGG GCATGGGGTG 
2451 GACAGCAGGA CCGTTTGCCA ACCTGGGGAG CCAGGGAGGT GGACACGGAG 
2501 CAGCTGGACT CAGGCTTGCC TGCACCTGTG TCCAGTGACT GTGACATTCT 
2551 GACGGTAGGC ACATGTGCGT GGTGGCAGCC CAGCCTGTTC CTGCCCCGTT 
2601 GGGGAGGTGA GCTTCAGGAG GCTACAGGGT GGTTTTCAGC CAGGAACCGC 
2 651 AGAGCCAATA GGCCGGAGCT GAGCCTGGAC AGGGTGCCGC CACGCCGCCC 
2701 CTCAGCACTG CTGGCCTCAG CACACCCCAT GGCATGGGCT TGGTGTCGTA 
2751 ATCCCATCTC ACCCCACGAT GGATTCTGGA TCCAGCAGGG CCCAGCGTCC 
2801 ATCCATACCG GGCAGGGGGC TGGGGCCCGC GCTGCCAGGA GAAGGCCCAG 
2851 CACCAATCCC CGGCCCTGGG TGGGCGAGGG GTCCGCCCCA AGGGGCCCGT 
2 901 TGCTGCCGGG GACCTTGTCG TTTGGCCCTG GATCCGGGGG CTCCTGTGAC 
2951 CATGCCCTCT TCTCGGCCGC AGGTCGGCCA CGGGACCTGA CGCAACAGGA 
3001 TGGACGAGTC CCCTGAGCCT CTGCAGCAGG GCAGAGGGCC GGTGCCGGTC 
3051 CGACGGCAGC GCCCAGCACC CCGGGGTCTG CGTGAGATGC TGAAGGCCAG 
3101 GCTGTGGTGC AGCTGCTCGT GCAGTGTGCT GTGCGTCCGG GCGCTGGTGC 
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3151 AGGACCTGCT CCCCGCCACG CGCTGGCTGC GTCAGTACCG CCCGCGGGAG 
3201 TACCTGGCAG GCGACGTCAT GTCTGGGCTG GTCATCGGCA TCATCCTGGT 
3251 GCCGCAGGCC ATCGCCTACT CATTGCTGGC CGGGCTGCAG CCCATCTACA 
3301 GCCTCTATAC GTCCTTCTTC GCCAACCTCA TCTACTTCCT CATGGGCACC 
3351 TCACGGCATG TCTCCGTGGG CATCTTCAGC CTGCTTTGCC TCATGGTGGG 
3401 GCAGGTGGTG GACCGGGAGC TCCAGCTGGC CGGCTTTGAC CCCTCCCAGG 
3451 ACGGCCTGCA GCCCGGAGCC AACAGCAGCA CCCTCAACGG CTCGGCTGCC 
3501 ATGCTGGACT GCGGGCGTGA CTGCTACGCC ATCCGTGTCG CCACCGCCCT 
3551 CACGCTGATG ACCGGGCTTT ACCAGGTGAG GAGCCCTGCT TGGGCACAGG 
3 601 GAGGGGCCCA GGGCACCCCC CCTTAGGTTT TGGCCATCCA CGAGGGCAAG 
3651 GCTGGGGGCA AGCACAGGGT TGGCAGAGGA GGTGCTGGCC CAAGACAGCA 
3701 AGGCTTGGGC AGAGCTGGGG CGTGCCGGGG CATCCCAGGG CGAGGCACCG 
3751 ACGCGGAGAG GCTGTGGATG CAGGAGGGGA GGGGCACGGG GAGCCAGTCC 
3801 GGTGGGCCAT GGCCTTGGTG GGGACCAGCA GGCCAGGTGT GGCTGTGGCT 
3851 CAGTGGTGCT GGACTGAGGC CATGTGGCCT CCCAGGCCTT CTGTCCTAGG 
3901 TGGAGTGGGG GATGGCCTCC CCACCCCCGA AGGTCTCCTG CCTTGGCCTG 

3 951 TCCACCTTGG CCCCCGTTGG CTCCACATCT GCATGGGGGG CAGTGGGCAC 

4 001 CATGTGTAGG AAGCAGCAGG AAGGGGTTGC CTTCTGATAC CAGAGGTCTT 
4051 AATTCTGAAA TAAAACGGGC TGCTGCACGT GACAAGGGTT AGACGTGTCT 
4101 ATGGCCAGCT GTGTGCACGT GTGATGCTCA CGTGGATGTC AC AGT TGTCT 
4151 GCGGGCATGA GCACGCGTGG AACCAGAACT CAGGCCCGTG TGAGGAGTCT 
4201 GGTTTGGAAC ACACGGGGCC GCAACACAGA ATTGTCAGGT CCTGTGCCGT 
4251 GACCACCACC CCTCGGGCCA TGCCAGGTGC TGGTGAGGGG CAGGTGGCTC 
4301 CCGCCAGGCG CCTGCTGGCC TGACCGCACT CCGTCCACAG GTCCTCATGG 
4351 GCGTCCTCCG GCTGGGCTTC GTGTCCGCCT ACCTCTCACA GCCACTGCTC 
4 4 01 GATGGCTTTG CCATGGGGGC CTCCGTGACC ATCCTGACCT CGCAGCTCAA 
4 451 ACACCTGCTG GGCGTGCGGA TCCCGCGGCA CCAGGGGCCC GGCATGGTGG 
4501 TCCTCACATG GCTGAGCCTG CTGCGCGGCG CCGGGCAGGC CAACGTGTGC 
4 551 GACGTGGTCA CCAGCACGGT GTGCCTGGCG GTGCTGCTAG CCGCGAAGGA 
4 601 GCTCTCAGAC CGCTACCGAC ACCGCCTGAG GGTGCCGCTG CCCACGGAGC 
4 651 TGCTGGTCAT CGTGGTGGCC ACACTCGTGT CGCACTTCGG GCAGCTCCAC 
47 01 AAGCGCTTTG GCTCGAGCGT GGCTGGCGAC ATCCCCACGG GTTTCATGCC 
4751 CCCTCAGGTC CCAGAGCCCA GGCTGATGCA GCGTGTGGCT TTGGATGCCG 
4801 TGGCCCTGGC CCTCGTGGCT GCCGCCTTCT CCATCTCGCT GGCGGAGATG 
4 851 TTCGCCCGCA GTCACGGCTA CTCTGTGCGT GCCAACCAGG AGCTGCTGGC 
4 901 TGTGGGCTGC TGCAACGTGC TACCCGCCTT CCTCCACTGC TTCGCCACCA 
4 951 GCGCCGCCCT GGCCAAGAGC CTGGTGAAGA CAGCCACTGG CTGCCGGACA 
5001 CAGCTGTCCA GCGTGGTCAG CGCCACCGTG GTGCTGCTGG TGCTGCTGGC 
5051 GCTGGCACCG CTGTTCCACG ACCTACAGCG AAGCGTGCTG GCCTGCGTCA 
5101 TCGTGGTCAG CCTGCGGGGG GCCCTGCGCA AGGTGTGGGA CCTCCCGCGG 
5151 CTGTGGCGGA TGAGCCCGGC TGACGCGCTG GTCTGGGCAG GCACCGCGGC 
5201 CACCTGTATG CTGGTCAGCA CAGAGGCCGG GCTGCTGGCT GGCGTCATCC 
5251 TCTCGCTGCT CAGCCTGGCC GGCCGCACCC AACGCCCACG CACCGCCCTG 
5301 CTGGCCCGCA TCGGGGACAC GGCCTTCTAC GAGGATGCCA CAGAGTTCGA 
5351 GGGCCTCGTC CCTGAGCCCG GCGTGCGGGT GTTCCGCTTT GGGGGGCCGC 
54 01 TGTACTATGC CAACAAGGAC TTCTTCCTGC GGTCACTCTA CAGCCTCACG 
5451 GGGCTGGACG CAGGGTGCAT GGCTGCCAGG AGGAAGGAGG GGGGCTCAGA 
5501 GACGGGGGTC GGTGAGGGAG GCCCTGCCCA GGGCGAGGAC CTGGGCCCGG 
5551 TTAGCACCAG GGCTGCGCTG GTGCCCGCAG CGGCCGGCTT CCACACAGTG 
5601 GTCATCGACT GCGCCCCGCT GCTGTTCCTA GACGCAGCTG GTGTGAGCAC 
5651 GCTGCAGGAC CTGCGCCGAG ACTACGGGGC CCTGGGCATC AGCCTGCTGC 
5701 TAGCCTGCTG CAGCCCGCCT GTGAGAGACA TTCTGAGCAG AGGAGGCTTC 
5751 CTCGGGGAGG GCCCCGGGGA CACGGCTGAG GAGGAGCAGC TGTTCCTCAG 
5801 TGTGCACGAT GCCGTGCAGA CAGCACGAGC CCGCCACAGG GAGCTGGAGG 
5851 CCACCGATGC CCATCTGTAG CAGGGCCAGG CCTGCCCAGC AGCCTCTGCT 
5901 CCCTCCTGGG GACCCACAGC AGACGTCTGC AAGCCACTGC TGAGACCCTT 
5951 CCCAGGGAGG AGCCACCCAA GAGCTGCACT CTTGTGCCAC AGCTGCCCTG 
6001 GGGAAACCGG GGAACCCCAA CTGGGAAAGG AGGCCCTCTG ATCACACGCA 
6051 GGACCCAAAC ACTCAGAAAT CAAGAACCTC TGCCTCCGAG ACAGGCTGGC 
6101 CCACAGTGCT GGCTGGGCCC CAATGCACCG TCCCTCAGCT CAGAAGGGAT 
6151 GGGCCTGACC TGACGCTCAG GGTTGACATC TTATTTGAAC AAGGGTCCCC 
6201 CGCCATCATG CAGCCTCCAA GGTGCCAAGA GGACTCCCTA TGCCCAGGCC 
6251 TGCCCGGTGC CCACCCTGCT GGTAGGAGCC AGCGGCTCTG GCCAAGTGCA 
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6301 CGAGGGTCTC TGTGTTTCCA GAAGGCCCCA CACACCCAAG TGCCCCTCAC 
6351 ACCTCGTGCC TCCCCCTCAC AGGGTGGCCA CCTGCACCAG CGTCAGGGCC 
6401 CAGGGTGCTG TGACCGATGA GACCTCAGCT CAGCCCTCAG GTGCAGTGGC 
6451 CCTACCCAGC CTGGCCAGCA GACACACACA GGGATGCTCA CGGGTGCACC 
6501 AGGAGCCAGG TGCGGCGCAG CCAACCCTGA GCCTGCAGGG AGACCTGCAG 
6551 GAAGCCCACC GTGCCCCATG CAGGGGCTCC CTCCAGCACA CAGCCCTCAG 
6601 CCCAGCACAG CCAGCAAGGA CACGCTCTCC CCAACAGGGT GCTTCGGCGG 
6651 GAGGTGGGGG AACAAGGGGT CTTCCGAGCA GCCCCCAGCC CTCCCCTCCC 
6701 ATCTGTGCCT CTGTAAGGGG CTCTGGGACG CCCAGACCCT GCCCGCCGCC 
6751 CACCTGGTGG TGACAAGCTC CAGCAGCCAG TGGGTCCGGA CCTGCTTGAT 
6801 GCCGCGGTGA GGGACGGCGC CCACATAGGC GAGGTTGAGC TGCTGGTCCC 
6851 AGCTGAGGAC GTACTGGTCA GCCTGGCTGT GTGGCAGCGG GGGGCTGGGG 
6901 ACAACAAAGG GGCGGCTCAG TCCCGAGCCT CAGCATGGCT GGCAGCGCGG 
6951 CTGACACACA CGTTCAAGCC CAGGACTGCC CGGGCGCAGG ATCCAGGCGC 
7 001 TGCCCGTGCG TTCAGTGACT AATAAAATGA CCCTTAGGGC CAGGAATGTG 
7051 GGGAGGTCCC ATCTTCATGG GGAACGGCAG CAGCAGTAAG ACGAGGGGCC 
7101 AACGCCAGCC CTGGCCCTGG CCCTGCCAGG AAGGCGGGTA CCTCAGCTCT 
7151 AGGTGGAAGG AATGGGACAG GCAGGCCAGG TCCCGCTGCA GGGCCGTCCA 
7201 CTCCCAGGGG AGACTCCTGG TTTACCTCAA AGAGCAGGAT CCCGGGCATC 
7251 GGCCTGGGCT GCAGGGGGCG GCCCAGGCTC ACGCCCCGGC GCCCACTCAG 
7301 GTGGAGGACC CACCCACAAA CACGGCGGGG GGCGGGCCCG GGAGAGCCAG 
7351 GGCCCCAGAG GAGGGAGCTC CGGTCTCTGA AGCTCTCACA GTGCGCAGTC 
7 4 01 AGGGGGCGCC CGAGCTCTCC CCGTGCGGCC AGGGGGTCCC GGAGGCCGCG 
7 451 GAGCGCTCAC CAGAAGCCTG TGCTCCTCCA GAAGCGCCGC AGGGGCCACA 
7 501 GCGCGCGGGC CGCGTCCACC TGCACCAGGT GCGGGGCCTC GGCCGGGGCC 
7 551 ACCGGGGGTG CGGCCAGGAG CGAGGCCAGG AGCGCCAGCA GCGCTGCGCG 
7 601 GGGGCGCAGG GGACGCATGG CCACGCGTGC TCGGGGACTG CGGGGCTTCG 
7 651 GGCTGCACTG CCGGTTCCGC CTCCGGGTCG GAGTCTGGGC GCGCACCCCA 
77 01 TGTGACCGCC GCCGCGGGGC GGGGCCTTGG TGAGGGGGCG ATGGCCGGGT 
77 51 GGGAGGGGTT GGGTGGCCTC GGGGAGCCTC GGGGAGCCGG GAG C AC G G C A 
7801 GGGCTTGGAG CCCCGCTTCC TTGCGGGCCT CAGGGGCTGC TCTGAGGACC 
7 851 GATGACTCGG AAAGCGCTCA GAAGAACGCT TCGCCCGTTG GTGCTATGTG 
7 901 AGTTGAGCCA TTACTGTCTT GTTTTTCTCT GTTTTTGTGT GTTTTTGAGA 

7 951 CAGAGTCTTG CTTTGTCGCC CAGGCTGAGG TGCAGTGGCG CGATCTCAGC 
8001 TCACTGCAAC CTCCATCTCC GGGGCTTCAG CGATTTTCTC ACCCCAGCCT 
8051 CCTGAGTAAA GCGTGCGCTT TAGCAGGAAG GAGAATTACC CCAGAAGAGC 
8101 ACACTGGGCC CTCCTTACAC TTGGCTTCAG ATCCATGGAT TCAACCAAGC 
8151 AGACTGAAAA TATTGTTTTA AAGCCAAAGC AATACGAAAT AATACATATT 
8201 TTAAAACAAT ACAGTATAAC AGCTATTTAC AGAGCATTTA CATTGTTTTA 
8251 GGGACTATAA GTAATCTTGA TTTAAACTAC ACAGTAGGAT GTGCGTAGGT 
8301 AATGTGCAAA TACTGTGCCA TTTTATATCA AGTACTTGAG CACCTGCAAA 
8351 TTTTGGTATC TGGGAGGGTC CTGGAACCAA TACCCCGAGG ATACCATGGG 
■8401 ACAACTGTAG TACATGTGTA GTCCATGTAT GCATGTGTGA ATCCAAGCAA 
8451 ACATTGTATA AAAATAATAA TGGAAAGAAC AGGCTTGGTG CGGTGGCTCA 
8501 CACCTGCAAT CCCAGCACTT TGGAATTGCA GGCCAACACG GGAGGATCAC 
8551 TTGAGGCCTG GAGTTTGAAA TCGGCCTGGG AGATGTACCA AGACCCCATC 

8 601 TGTACAAAAA AAAAAT T TAG CCAGATGCGA TGGTATATGC CTGTGAGGCC 
8651 CAGCTACCCA CGAAATTGAG GTGGGAGATT GCTTGAGCTT AGGAGTTCAA 
8701 GGCTGAGACG GGCCATGATC ACACCACTAC ATTCCAGCCT GGTTGACAAA 
87 51 ATGAGACCCC ATCTCTAAAA AAAGAAAAGA AAAAAAGAAC AGTCTACTAA 
8801 CAAAACGAAA ATACTGGACA ATAATCCTCT CTAAGTTGGG AGAAGGATAA 
8851 TTAGAGTTAC AGTGTTCT (SEQ ID NO: 3) 



FEATURES : 

Start: 3000 
Exon: 3000-3575 
Intron: 3576-4340 
Exon: 4341-5867 
Stop: 58 68 
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One non-coding Exon in the 5' UTR: 

(query = cDNA sequence; subject = genomic sequence) 
Score = 174 bits (88) , Expect - 7e-46 
Identities - 91/92 (98%) 
Strand = Plus/Plus 



Query: 1 tcgcgggagccagagggccctgcggtcctcggtggtcttgccagcccctcctcatcccag 60 

11 I I ! ! i I t i M i i f I M I I i t M I I I t I I I I 11 M I I 1 1 t I t I i M i i i I I I I i I I I I 
Sbjct: 1313 tcgcgggagccagagggccctgcggtcctcggtggtcttgccagcccctcgtcatcccag 1372 

Query: 61 ggccctccgcgcctgtgaggactccctcaggt 92 

! M M II M It IE M i I I I M i 1 i I i i M I I t 
Sbjct: 1373 ggccctccgcgcctgtgaggactccctcaggt 1404 



CHROMOSOME MAP POSITION: 

Chromosome 4 

ALLELIC VARIANTS: 

C/G nucleotide polymorphism at genomic position 13 63 (in non-coding exon; see 
cDNA/genomic sequence alignment above for the non-coding exon) 

V/A amino acid polymorphism at protein position 699 
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